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In plants, heterochromatin is maintained by a small RNA-based gene silencing mechanism known as RNA-directed 
DNA methylation (RdDM). RdDM requires the non-redundant functions of two plant-specific DNA-dependent RNA 
polymerases (RNAP), RNAP IV and RNAP V. RNAP IV plays a major role in siRNA biogenesis, while RNAP V may recruit 
DNA methylation machinery to target endogenous loci for silencing. Although small RNA-generating regions that 
are dependent on both RNAP IV and RNAP V have been identified previously, the genomic loci targeted by RNAP V 
for siRNA accumulation and silencing have not been described extensively. To characterize the RNAP V-dependent, 
heterochromatic siRNA-generating regions in the Arabidopsis genome, we deeply sequenced the small RNA populations 
of wild-type and RNAP V null mutant [nrpel) plants. Our results showed that RNAP V-dependent siRNA-generating loci 
are associated predominately with short repetitive sequences in intergenic regions. Suppression of small RNA production 
from short repetitive sequences was also prominent in RdDM mutants including clms4, drdl, dms3 and rdml, reflecting 
the known association of these RdDM effectors with RNAP V. The genomic regions targeted by RNAP V were small, with 
an estimated average length of 238 bp. Our results suggest that RNAP V affects siRNA production from genomic loci with 
features dissimilar to known RNAP IV-dependent loci. RNAP V, along with RNAP IV and DRM1/2, may target and silence a 
set of small, intergenic transposable elements located in dispersed genomic regions for silencing. Silencing at these loci 
may be actively reinforced by RdDM. 



Background 

Heterochromatin, highly condensed chromosomal DNA associ- 
ated with repetitive sequences and transposons, appears to play 
important roles in nuclear processes such as chromosomal seg- 
regation and genomic stability.^'^ Studies from fission yeast and 
plants have demonstrated a role for small RNA in heterochroma- 
tin formation and maintenance.^'^ In the plant silencing mecha- 
nism called RNA-directed DNA methylation (RdDM), siRNA 
directs de novo cytosine methylation at homologous DNA 
regions. ^'^ These RNA-based silencing mechanisms provide an 
important level of epigenetic control to repress transposable ele- 
ments (TEs) and aberrant genes such as transgenes, while also 
playing a role to regulate the expression of endogenous genes. 

In plants, two nuclear DNA-dependent RNA polymer- 
ases have evolved to work exclusively in RNA-mediated silenc- 
ing pathways. DNA-dependent RNA polymerases IV and V 
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(RNAP IV and RNAP V) each have a unique largest subunit 
(NRPDl and NRPEl, respectively) yet share the second larg- 
est subunit (NRPD2/NRPE2) with each other;^^ they also have 
many subunits shared or paralogous to those in RNAP 11.^^ 
While mutations of NRPDl, NRPE and NRPD2/NRPE2 lack 
an obvious phenotypic impact, selective activation of transposons 
and other repeats in mutants has been observed. ^^'^^ RNAP IV 
functions upstream of RdDM along with RNA-DEPENDENT 
RNA POLYMERASE 2 (RDR2) and DICER-LIKE 3 (DCL3) 
to generate 24 nt siRNA from heterochromatic loci, while RNAP 
V functions downstream with ARGONAUTE 4 (AG04)- 
associated siRNA to facilitate de novo DNA methylation at 
siRNA target loci.^^'^^ Interestingly, recent work has demon- 
strated that RNAP II transcriptional activity is also involved in 
siRNA-directed gene silencing via interactions with an AG04- 
siRNA complex.^^'^^ Although it is unclear how RNAP IV, 
RNAP V and RNAP II activities are functionally integrated in 
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heterochromatin silencing, these studies suggest a delicate coor- 
dination and yet functional diversification of these three poly- 
merases in RdDM. 

RdDM likely has three major steps: (1) siRNA biogenesis, 
(2) production of non-coding transcripts as a scaffold and (3) 
assembly of AGO-siRNA effector complexes to recruit methyla- 
tion machinery and target genomic loci.^^ In siRNA biogenesis, 
heterochromatic regions are likely transcribed by RNAP IV, 
made double-stranded by RDR2, processed into 24 nt siRNA 
by DCL3, and finally the siRNA population is incorporated by 
AG04 and probably AG06 to form an AGO -siRNA complex.2^-26 
At some of these loci, RNAP V likely generates non-coding tran- 
scripts that could serve as the scaffolds. Other RdDM effec- 
tors involved in scaffold RNA generation include RNAP II, an 
IWRl-like transcriptional factor DEFECTIVE IN MERISTEM 
SILENCING 4 (DMS4),27 and the "DDR complex", which con- 
tains the SNF2-like chromatin-remodeling factor DEFECTIVE 
IN RNA-DIRECTED-DNA-METHYLATION 1 (DRDl), 
a structural-maintenance-of-chromosomes hinge domain-con- 
taining protein DEFECTIVE IN MERISTEM SILENCING 
3 (DMS3), and the novel protein RNA-DIRECTED DNA 
METHYLATION 1 (RDMl), which binds a methylated sin- 
gle-stranded DNA.^5'2^'^° -pj^g resulting scaffold RNAs plus the 
AGO-siRNA complex form a guiding complex together with an 
SPT5-like transcriptional elongation factor (KOW-DOMAIN 
CONTAINING TRANSCRIPTION FACTOR 1, KTFl), 
the zinc-finger domain protein INVOLVED IN DE NOVO 2 
(IDN2), and the DDR complex.^^'^^'^^ Finally, through a mecha- 
nism that is not fully understood, the guiding complex recruits 
DNA methyltransferases and histone methyltransferases, such 
as DOMAINS REARRANGED METHYLASE 1 and 2 
(DRMl and DRM2), CHROMOMETHYLASE 3 (CMT3), 
SUPRESSOR OF VARIEGATION 3-9 HOMOLOG 9 
(SUVH9) and SUVH2 to direct the silencing of specific genomic 
loci,^^'^^ which results in both the generation and maintenance of 
heterochromatin. 

Unlike RNAP IV, it is less clear how RNAP V functions in 
siRNA production. It has been suggested that the role of RNAP 
V in RNA-directed gene silencing is to promote DNA meth- 
ylation by recruiting the silencing complex to siRNA-targeted 
Jq^j 15,16,19,28 Yet, studies have shown that RNAP V mutants have 
reduced small RNAs at some loci, indicating a non-redundant 
role of RNAP V to RNAP IV in siRNA accumulation.^6'^9,2i 
Heterochromatic silencing may require two rounds of siRNA 
production; while RNAP IV is necessary for the production of 



primary siRNA for de novo methylation at endogenous loci, 
RNAP V targeting is crucial for the production of siRNA from 
methylated loci for reinforcement and possibly spreading of 
methylation.^'^*^'^^ Although siRNA production seems to largely 
rely on RNAP IV action, the presence of RNAP IV-dependent 
but RNAP V-independent siRNA-generating loci implies the 
existence of different RNAP IV- and RNAP V-directed siRNA 
activity at endogenous loci.^*^'^^'^^ Analysis of transgene-targeting 
siRNAs shows that a loss of function of RdDM proteins, includ- 
ing DMS4, DRDl, DMS3 and RDMl, which are responsible for 
scaffold RNA generation, impacts secondary siRNA production 
from the region downstream of the RdDM target, 24 nt siRNA 
biogenesis from RNAP IV- and RNAP V-dependent loci, and 
recruitment of RdDM machinery by RNAP V.^^'^^'^^ Thus, we 
hypothesized that RNAP V may direct siRNA generation and 
target methylation at specific genomic loci, possibly represent- 
ing endogenous equivalents to the transgene-derived secondary 
siRNAs. 

To characterize RNAP V-dependent, heterochromatic siRNA- 
generating regions in Arabidopsis, we employed next-generation 
sequencing to deeply sequence the small RNA population in 
wild-type and RNAP V null mutant {nrpel) plants. Our results 
showed that A^T^/^^i- dependent siRNA-generating loci are asso- 
ciated predominately with short repetitive sequences in intergenic 
regions, with an overrepresentation of short interspersed elements 
(SINEs) and rolling circle/helitron (RC/Helitron) sequences. 
Suppression of small RNA production from short repetitive 
sequences was also prominent in RdDM mutants, including 
dms4, drdU dms3 and rdml mutants, indicating an equivalency 
with RNAP V. The genomic regions targeted by RNAP V were 
generally quite small, with the average impacted region spanning 
only a few hundred base pairs. Our results suggest that RNAP V 
affects siRNA production from specific loci with genome features 
dissimilar to known RNAP IV-dependent loci, indicating that 
RNAP V operates on specific genomic loci for siRNA production 
in RdDM. 

Results 

Identification of > 2,000 A^7?P^7- dependent small RNA loci 
by small RNA profiling. To investigate the role of RNAP V in 
RdDM, we analyzed small RNA from Arabidopsis immature 
inflorescences of wild-type Columbia (ecotype Col-0, hereafter 
"Col") and a mutant of the largest RNAP V subunit, NRPEl 
(allele nrphlh-11, hereafter ''nrpe'). We generated small RNA 



Figure 1 (See opposite page). Small RNA-generating loci are suppressed in nrpe mutants. (A) Small RNA size profiles in wild-type (Col) and a null 
mutant of NRPEl {nrpe) replicate libraries, normalized to the percentage of 21 nt abundance in wild-type libraries. For each size class, small RNA abun- 
dance (excluding structural RNA) was calculated as a percentage to the sum of abundances of total genome-matched reads. (B) Averaged percentage 
of abundances in small RNA size classes, calculated from data in (A); "Col avg" or "nrpe avg" indicate the averaged values for each set of three libraries. 
(C) Number of clusters impacted in the nrpe mutant, based on summed small RNA abundances per cluster. For each cluster, we compared the average 
of small RNA HNA of three Col libraries to the average of small RNA HNA of three nrpe libraries. The ratio of Col vs. nrpe is used when the HNA of Col 
libraries is greater than that in nrpe libraries, while the ratio of nrpe vs. Col is used when the HNA of nrpe libraries is greater than that in Col libraries 
(shown as negative values). The inset graph (note the reduced y-axis) expands to the full range of the fold differences to explain the high value of the 
"> 10" column; the basis for the high "> 10" bar is a very long tail of low frequency clusters highly impacted in nrpe. (D) Genie vs. non-genic and repeat- 
vs. non-repeat-associated characteristic of RNAP V-dependent small RNA clusters based on the TAIR version 9 annotations. RNAP V-dependent ("RNAP 
V-dpt") and RNAP V-independent ("RNAP V-indpt") clusters were defined as described in text. A total of 1 1,667 small RNA-generating, 2,201 RNAP 
V-dependent and 7,680 RNAP V-independent clusters were analyzed. 
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Figure 1. For figure legend, see page 782. 



libraries from three biological replicates of Col or nrpe sam- 
ples (Table SI) for lUumina sequencing. The raw small RNA 
sequences were trimmed to remove adaptor sequences and 
matched to the Arabidopsis genome (TAIR version 9) . In order 



to compare between libraries, the abundance of each small RNA 
was normalized to reads per five million (RP5M). For repli- 
cates of Col and nrpe libraries, we obtained 1.3 to 6.2 million 
total genome-matched small RNA sequences, corresponding to 
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0.5 to 2.1 million "distinct" sequences that are different 
sequences, found uniquely in the data set. We used a Spearman's 
correlation to assess the reproducibility of small RNA data sets in 
replicates by pairwise comparison (Fig. SI). The correlation coef- 
ficient rho among Col or nrpe libraries was high (approximately 
0.73 to 0.80), indicating a strong correlation among replicate 
libraries. The proportion of distinct vs. total genome-matched 
reads represented the degree of divergence and complexity of the 
small RNA population being captured in the library, which was 
from 0.346 to 0.455 for wild-type and from 0.350 to 0.439 for the 
nrpe samples (Table SI A). The small RNA complexity is much 
lower compared with the results of a previous study by Mosher et 
al.^^ (Table SIB), probably resulting from the ^100-fold increase 
of our sequencing depth due to improved technologies. More 
importantly, the proportion of distinct small RNA reads was 
similar between wild-type and nrpe mutant libraries (median val- 
ues 0.346 and 0.352, respectively), in agreement with the results 
from Mosher et al. (Table SI). The reduction in the small RNA 
complexity in nrpe libraries was minimal compared with prior 
reports of substantial reductions in an Arabidopsis rdr2 mutant 
(proportion of distinct = 0.04 or 0.05) (Table Sl).^^ In wild-type 
Arabidopsis inflorescences, the proportion of 24 to 21 nt small 
RNA abundances is typically about 3:1, in which most 24-mers 
are a diverse population of heterochromatic siRNAs and most 
21-mers are highly abundant miRNAs.^^ Our results showed that 
the proportion of 24 nt small RNA abundances were reduced in 
all nrpe libraries compared with all wild-type libraries (Fig. lA) 
and to a ratio of 1.62:1 for the averaged replicates (Fig. IB), a 
reduction of more than 1.5 fold in the nrpe data set. This impact 
in 24 nt siRNA is more reminiscent of an rdr2 mutant, but at a 
much lower degree of severity, as the Arabidopsis rdr2 mutant 
showed a ratio of 0.16:1 for 24 to 21 nt abundances which was a 
nearly complete reduction (18-fold) over wild-type.^^ 

Given the incomplete reduction in 24 nt siRNA abundances 
and a minimal impact on complexity, yet knowing that RNAP 
V has a known but secondary role in heterochromatic siRNA 
biogenesis, we were curious to know which genomic loci showed 
reduced siRNA levels in the nrpe mutant. To identify the 
RNAP V-dependent, siRNA-generating regions in the genome, 
we deployed a proximity-based algorithm to group and quan- 
tify clusters of small RNAs. In Arabidopsis, a total of 239,339 
adjacent, non-overlapping fixed-size (500 bp) bins or "clusters" 
were defined to cover the complete nuclear genome. A value of 
"hits-normalized-abundance" (HNA) was calculated by divid- 
ing the normalized abundance (in RP5M) for each small RNA 
by the number of genomic locations to which the small RNA 
maps (its "hits"). In the cluster analysis, the HNA values for 
all the small RNA that mapped to a given 500 bp cluster was 
summed separately for each library regardless of the small RNA 
sizes. Finally, the sum of abundance in each cluster was averaged 
for the three Col replicate libraries and compared with the aver- 
aged value of three nrpe libraries. Since we had demonstrated a 
strong correlation among the three replicates, we used the set of 
averaged replicate library abundances for all subsequent analy- 
ses. The cluster analysis summarizes the small RNA abundance 
within specific genomic regions where siRNA may be produced. 



and also provides a practical way to compare the same genomic 
region across different libraries. Furthermore, each cluster was 
annotated with gene and repeat information (e.g., TAIR9 anno- 
tated retrotransposons and DNA transposons), which allowed us 
to characterize the genomic features associated with particular 
small RNA-generating loci. 

Based on the result of the cluster analysis, we identified 
genomic loci for which small RNA production was dependent 
on RNAP V. We selected 11,667 clusters from a total of 239,339 
clusters at which the sum of HNA of Col and nrpe libraries 
was greater than a baseline value (HNA = 100 RP5M), exclud- 
ing the regions from which minimal small RNAs are produced 
(Table S2). Next, we compared the small RNA abundance of 
each individual cluster in wild-type and nrpe libraries and cal- 
culated the fold difference of HNA between them. As shown 
in Figure IC and Table S2, the majority of the small RNA- 
generating loci were not impacted in the nrpe mutant, with 65% 
of the clusters showing a less than 2-fold difference to the control. 
Only 25 small RNA-generating clusters were greatly upregulated 
in nrpe compared with the wild-type, with at least 10-fold higher 
levels in nrpe (a review of these loci demonstrated no obvious 
pattern or significant characteristic, and thus these were not con- 
sidered further). Notably, we observed a clear preference toward 
downregulated clusters in the nrpe mutant, with 18% of the total 
(2,201 clusters) exhibiting at least a 10-fold higher small RNA 
abundance in the Col compared with nrpe libraries (Fig. IC). 
An examination of a randomly-selected set of clusters with small 
RNA abundances at least 10 -fold higher in wild- type demon- 
strated quite substantial differences (50-fold or more) can exist 
between the mutant and wild-type (Table S3). Next, all small 
RNA-producing clusters were classified based on overall abun- 
dances in wild-type and nrpe libraries (Fig. IC; Table S2). In the 
/^r^d'-suppressed loci, the abundance of small RNAs per impacted 
cluster was lower than those not impacted in the mutant; 1,427 
out of 2,201 (65%) had a summed HNA from 101 to 250 RP5M, 
and 745 out of 2,201 (34%) had a summed HNA from 250 to 
1,000 RP5M. Similarly, only 29 of 1,171 clusters with RP5M 
(2.5%) exceeding 1,000 RP5M were greatly impacted in nrpe 
(Table S2). Thus among the 2,201 clusters impacted in nrpe, 
the effect was greater on small RNA clusters of low to moderate 
abundance. For the purposes of this analysis, we defined these 
2,201 clusters as "RNAP V-dependent," since their small RNA 
accumulation was reduced at least 10 -fold or more in the absence 
ofNRPEl. 

Overrepresentation of SINE and RC/Helitron repeats in 
RNAP V-dependent small RNA loci. We were interested to 
characterize the type of genomic regions dependent on RNAP 
V for small RNA biogenesis, which represented only 18% of all 
of the small RNA-generating loci in the Arabidopsis genome. 
In addition, we defined 7,680 clusters, which showed mini- 
mal changes between the Col and nrpe data sets (±2-fold dif- 
ference in HNA) as 'RNAP V-independent' clusters. First, the 
RNAP V-dependent and RNAP V-independent clusters were 
categorized based on the gene and repeat annotations. For small 
RNA-generating clusters in the wild-type libraries, less than half 
(42%) were located in genie regions and 58% of the clusters were 
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Figure 2. Genome-wide distributions of RNAP V-dependent small RNA-generating loci. Upper part, the distribution of small RNA-generating loci and 
small RNA abundance in wild-type along chromosome 1 with bar height indicating the sum of HNA in Col libraries; red bars indicate HNA greater than 
500; full-height bars are HNA between 100 to 500, and shorter bars are HNA < 100. The chromosome is illustrated below in gray, with approximate 
pericentromeric regions based on centromeric staining data marked with green bars.^^ " Middle part, differentially-expressed RNAP V-dependent 
small RNA-generating clusters; full-height bars in black are clusters with reduction relative to Col of between 10- to 50-fold; red bars indicate clusters 
with fold reduction > 50. Lower part, the density of RNAP V-dependent clusters across chromosome 1, plotted as the percentage of number of clusters 
in 1 Mb windows over the total numberof RNAP V-dependent clusters mapped to chromosome I.The distribution of small RNA-generating clusters 
and NRPD1(RNAP IV)- and RDR2-dependent clusters were plotted for comparison. Data for chromosomes 2 to 5 are shown in Figure S2. 



in intergenic regions, suggesting a slight intergenic disposition 
(Fig. ID). For the RNAP V-independent clusters, half were 
located in genie regions while the other half were in intergenic 
regions. However, only 27% of RNAP V-dependent clusters 
were in genie regions while the majority (73%) was in intergenic 
regions, indicating a strong intergenic pattern. When we com- 
pared the clusters to repeat annotations, RNAP V-dependent 
clusters were not more repetitive than RNAP V-independent 
clusters or than small RNA-generating clusters in wild-type 
(Fig. ID). This result was not unexpected, as previous studies 
had focused on intergenic regions as the targets of RNAP V and 
RNAP II activity in RdDM.^^'^^ Prior reports had implied that 
RNAP V-dependent regions might be more distal to the centro- 
meres,^^ so we examined the genomic distribution of these RNAP 
V-dependent clusters. In wild-type Arabidopsis, small RNAs are 
known to be most abundant in the transposable element-rich, 
pericentromeric regions of the chromosomes (Fig. 2; Fig. S2, 
upper and lower parts) .^^'^^ The distribution of RDR2-dependent 
clusters from an analysis of an rdr2 mutant^^ coincided with 
the distribution of small RNA-generating clusters in wild-type 
plants, with both exhibiting a strong pericentromeric localiza- 
tion. On the other hand, RNAP V-dependent clusters had less 
pericentromeric concentration (Fig. 2; Fig. S2, middle and lower 
parts) in comparison to the RDR2-dependent clusters, suggest- 
ing a more dispersed and possibly euchromatic chromosomal dis- 
tribution of the RNAP V-dependent loci generating these small 
RNAs. The genomic locations of RNAP V-dependent and RNAP 
I V-dependent clusters were very similar (Fig. 2; Fig. S2); while 
the Mosher et al. (2008) analysis suggested a pericentromeric bias 
for RNAP IV-dependent region, our data are not directly compa- 
rable due to the low depth of their sequencing data. Nonetheless, 



both our and their studies indicate that RNAP V-dependent loci 
are predominantly dispersed across the chromosomes in non- 
pericentromeric regions. 

Since heterochromatic siRNAs produced by RNAP IV and 
RNAP V were shown to be associated with certain repetitive 
sequences, ^'^'^^ we examined the types of repeats impacted in 
nrpe. We analyzed the 2,201 RNAP V-dependent clusters and 
7,680 RNAP V-independent clusters using the repeat annotation 
in Arabidopsis TAIR9 genome,^^ which allowed us to associate 
individual clusters with any overlapping repeat type. We should 
note that we also performed this analysis using repeats identified 
by RepeatMasker,^^ and our conclusions were not substantially 
different (data not shown); the analyses described here were gen- 
erated using the TAIR-annotated repeats. Among the clusters 
generating small RNAs at a level greater than 100 RP5M, 59% of 
clusters were repeat-associated while 41% (4,740 out of 11,667) 
were not associated with known repeats (Table S4). Similarly, 
59% of RNAP V-dependent clusters were repeat-associated 
while 41% (904 out of 2,201) of these clusters were not associ- 
ated with repetitive sequences. The percentage of repeat-associ- 
ated clusters was quite similar (42% to 41%) between RNAP 
V-independent and dependent clusters (Table 54), suggesting 
that RNAP V-dependent, siRNA-generating regions were no 
more or less repetitive compared with either wild-type or RNAP 
V-independent regions. 

Although the overall level of repeat association is similar 
between RNAP V-dependent and independent loci, certain classes 
of repeats were much more predominant in RNAP V-dependent 
clusters than the control set and vice versa. Most notably, SINEs 
were associated with 0.1% of RNAP V-independent clusters, 
but they were 7.1% of RNAP V-dependent clusters, a 54-fold 
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Figure 3. SINEs are overrepresented in nrpe-suppressed siRNA-generating loci. (A) Slasses of repetitive sequences represented in RNAP V-dependent 
snnall RNA-generating clusters. RNAP V-dependent and RNAP V-independent clusters are defined as described in the Figure 1 legend. Repetitive se- 
quences are identified based on Arabidopsis TAIR9 repeat annotation. Percentage of repeat-annotated clusters to the numbers of RNAP V-dependent 
or RNAP V-independent loci is shown. Change of repeat class representation is calculated by the ratio of its percentage in RNAP V-dependent vs. 
RNAP V-independent cluster (in red) or by the ratio of its percentage in RNAP V-independent vs. RNAP V-dependent cluster (in blue), which indicates 
whether the repeat class is over-represented or under-represented in RNAP V-dependent loci, respectively. (B) Overrepresentation of SINEs in RNAP 
V-dependent clusters. Number of SINE-annotated clusters is plotted against the fold difference in small RNA abundance between wild-type and nrpe 
libraries. Number of RC/Helitron-annotated clusters, another repeat class that is overrepresented in RNAP V-dependent clusters, is also plotted. 



difference (Fig. 3A). The RC/Helitron, DNA/Mariner, DNA/ 
hAT, DNA/Harbinger and LINE/Ll elements were all repre- 
sented at a higher percentage in RNAP V-dependent loci than 



in RNAP V-independent loci (Fig. 3A; Table S4) . On the other 
hand, LTR/Copia- and LTR/Gypsy-associated clusters were 
underrepresented in RNAP V-dependent clusters, with the 
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greatest difference of representation as 16.1% in the control set 
vs. 1.0% in RNAP V-dependent clusters for LTR/Gypsy repeats 
(a 17-fold difference). Therefore, fewer LTR-associated clusters 
were represented in highly suppressed small RNA-generating loci 
in the RNAP V mutants, indicating that small RNA production 
from LTR-associated loci was less likely to depend on RNAP V. 

Our results showed that SINE was the most overrepresented 
repeat class associated with RNAP V-dependent clusters com- 
pared with the RNAP V-independent clusters. RC/Helitron, the 
most abundant repeat class associated with RNAP V-dependent 
clusters, are also overrepresented by nearly 2-fold in RNAP 
V-dependent loci. SINE and RC/Helitron elements are highly 
abundant, small-sized transposable elements in the Arabidopsis 
genome, which led us to ask how many of these overlap small 
RNA-generating clusters affected by the nrpe mutation. Most 
of the 188 small RNA-generating, SINE-associated clusters 
had a low-to-medium abundance, and 156 (83%) were greatly 
impacted in nrpe mutants with at least a 10-fold reduction in the 
small RNA abundances (Fig. 3B; Table 54). Similarly, 28% (492 
out of 1,757) of small RNA-generating RC/Helitron-type loci 
showed greatly reduced small RNA abundances in nrpe mutant 
(Fig. 3B; Table 54). Although 11% of RC/Helitron-type small 
RNA-generating loci had high levels of small RNA abundances 
(HNA > 1,001 RP5M), almost all the RC/Helitron-associated 
loci impacted in the RNAP V mutant had a low to moderate 
level of small RNAs, suggesting that the RNAP V dependency 
was greater for SINE- or RC/Helitron-type clusters of low-to- 
moderate small RNA abundance (Fig. 3B). For other overrepre- 
sented repeats in nrpe, such as DNA/Mariner and DNA/hAT, a 
substantial proportion of their small RNA-generating loci were 
also strongly suppressed in nrpe (57% for DNA/Mariner and 
25% for DNA/hAT) (Table 54). Conversely, only 10% to 1.6% 
of small RNA-generating clusters of LTR-type repeats (LTR/ 
Copia and LTR/Gypsy) were impacted in RNAP V mutants. 
Taken together, these results showed that repeat classes including 
SINEs exhibit high RNAP V dependency since 20% to 80% of 
their small RNA-generating loci were heavily suppressed in the 
nrpe mutant. We found it curious that different classes of trans- 
posable elements (e.g., DNA transposons and retrotransposons) 
were affected in nrpe mutant in a similar fashion. The relatively 
small size of some of these types of elements suggested a possible 
correlation between the repeat length and the impact on small 
RNA abundance. 

RNAP V-dependent small RNA-generating loci comprise 
short genomic regions. The enrichment of short repeat classes 
such as SINE and RC/Helitron and the dearth of long LTR spe- 
cies in RNAP V-dependent loci led us to speculate about the 
role of repeat size in RNAP V dependency. Prior reports have 
indicated that it may be difficult to maintain the silenced state 
of regions of just a few nucleosomes in length;^*^ hence, we sus- 
pected that RNAP V may be responsible for the silencing of 
small repetitive regions. We used several approaches to analyze 
the relationship between size and RNAP V-dependency. In the 
first such analysis, we conducted a comparison between the total 
length of the repeat vs. the reduction in the small RNA abun- 
dances in the nrpe mutant. The repeat element boundaries (start 



and end coordinates) were defined by the repeat annotation in 
TAIR9 genome, and the total hits-normalized small RNA abun- 
dance was determined for each repeat element for both wild- 
type and nrpe libraries. We selected 3,826 elements that had a 
sum of small RNA abundances above the baseline (HNA of Col 
> 100 and HNA of nrpe > 5 RP5M). For these elements, we plot- 
ted the fold difference of HNA between wild-type and mutant 
libraries vs. the lengths of the repeat elements (Fig. 4A). The 
result showed that the majority of highly /^r^d'-suppressed small 
RNA-generating loci were associated with repetitive sequences 
5 kb or less in length. We next plotted the fold difference of HNA 
from repeat elements under 5 kb in 500 bp increments in length 
(Fig. 4B). About one-third (1,395 out of 3,826) of repeat ele- 
ments shorter than 1 kb produced small RNAs and, from those, 
536 repeat elements showed at least a 10-fold reduction of small 
RNA abundance in nrpe mutant (Fig. 4B). These data indicate 
that the length of the repeats was inversely correlated with the 
degree of /^r^d'-suppression in small RNA abundances, and small 
RNA production was mostly impacted in the nrpe mutant at 
shorter repeats. 

Next, we analyzed the repeat length within transposon fami- 
lies relative to the impact in the nrpe mutant. Transposons are 
remarkably heterogeneous in size, copy number and diversity 
within a repeat family. ^^'^^ To investigate whether the negative 
correlation between repeat size and RNAP V dependency also 
occurred within transposon families, we first separated the 3,826 
elements into different repeat classes and plotted the number of 
;2r^d'-suppressed elements vs. the total number of small RNA- 
generating repeat elements (Fig. 4C). RC/Helitron and SINE had 
the highest numbers of repeats suppressed in the nrpe mutant; the 
impact of nrpe affected more than 87% of small RNA-generating 
SINEs (Fig. 4C). This result agreed with the trend of nrpe- 
suppression we observed in the static cluster analysis using a fixed 
width of 500 bp (Table 54). Once the transposable elements 
were categorized into repeat classes, we examined /^r^d'-directed 
suppression using small RNA-generating repeats grouped by a 
repeat length in a 500 bp increment (Fig. 4D; Fig. 53). For every 
repeat class we examined in the nrpe mutant, it was clear that 
small RNA production was mostly impacted from shorter mem- 
bers of the repeat class. Therefore, our data indicated that the 
RNAP V dependency may be mostly determined by the size of 
the repeats regardless of the repeat types. 

To assess the correlation of repeat length and ;2r^d'-directed 
siRNA suppression, we calculated the median length of repeats 
generating small RNAs in wild-type and the subset of elements 
greatly impacted in nrpe for each repeat class (Fig. 4E). The 
range of sizes in the genome for each repeat class varies: LTR/ 
Gypsy and LTR/ Copia are the longest with a median exceeding 
2 kb; RC/Helitron, LINE/Ll and several types of DNA trans- 
posons have median around 1 kb; and the median size of SINEs 
is around 300 bp (Fig. 4E, white boxes). When the size ranges 
of RNAP V-dependent elements were plotted in comparison, 
it was clear that RNAP V-dependent clusters were predomi- 
nantly the subset of shorter elements within each repeat type 
(Fig. 4E, red boxes), which agreed with our previous results (Fig. 
4D). Perhaps the most revealing one was the LTR/Gypsy class: 
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Figure 4 (See opposite page). Repeat length is reversely correlated with the suppression of small RNA production in the nrpe mutant. (A) Plot of 
nrpe-dependent suppression of small RNA abundance vs. the length of repetitive sequences. The HNA was summed for the abundance of small RNAs 
mapped to the length of each annotated repetitive sequence, i.e., small RNA abundance for each repeat. Y-axis shows the Log^^ scale of fold difference 
of sum of HNA between wild-type and nrpe libraries. The best-fit trendline from the power regression is showed. The section within 5 kb repeat length 
is further expanded and shown in (B). (B) Degree of nrpe-dependent suppression. The width of boxes is proportional to the square root of the numbers 
of repeats. (C) Numbers of RNAP V-dependent, small RNA-generating repeats (dark red) vs. the total number of small RNA-generating repeats (light 
red) in different repeat classes are shown along with the percentage (the former vs. the latter). (D) Comparison of numbers of RNAP V-dependent, 
small RNA-generating repeats (red) and the total number of small RNA-generating repeats (light red) across the size of repeat in selected repeat fami- 
lies. (E) Length profiles of repeat-annotated small RNA-generating elements are plotted with fixed-width box. White boxes represent data from small 
RNA-generating elements, and red boxes represent data from RNAP V-dependent, small RNA-generating elements based on the criteria described 
in the main text. Median values for both data sets are shown. Repeat length between 250 bp to 300 bp is highlighted in blue. For box plots in (B and 
E), the box indicates the fold changes in the 25th to 75th percentile with the center bar indicating the median. The dashed lines indicate the range of 
values in the lower or upper quartiles, terminating at the minimum or maximum lengths. 



although these repeat elements are the least impacted in the nrpe 
mutant (Fig. 3A), the 22 LTR/ Gypsy elements demonstrating 
RNAP V-dependent small RNA suppression of 1,322 small 
RNA-producing LTR/Gypsy elements are substantially smaller 
in size (Fig. 4B). A similar trend was observed for all the larger 
elements including LTR/Copia, LINE/Ll and DNA/MuDR. 
It is worth noting that there was little difference in impacted 
SINE sizes relative to all SINEs, perhaps because the SINEs as 
a class have a median size of just 302 bp. In summary, our data 
showed that the nrpe mutation affects small RNA accumulation 
for smaller repeat elements across all repeat classes (Fig. 4). 

We sought to determine the approximate width of loci 
reduced in small RNAs in the nrpe mutant as a surrogate for a 
measurement of RNAP V transcript length, as such transcripts 
have not yet been identified on a genomic scale. From the 2,201 
clusters defined above, we selected 100 RNAP V-dependent 
clusters that had highly differential small RNA levels in Col vs. 
nrpe (HNA of Col >250 and HNA of nrpe = 0 to 5 RP5M). 
Consistent with our observation that RNAP V-dependent 
regions are predominantly small, almost all of these clusters 
are distal to each other in the genome. For each of these 100 
clusters, we extracted its 500 bp genomic sequence and flanking 
500 bp to obtain a 1,500 bp region in total, and we plotted the 
fold-difference between Col and nrpe small RNA abundances for 
each of the 100 clusters using a sliding window of 100 bp across 
the region (gray lines in Fig. 5A and B). Next, the distribution 
of the fold difference for each of the RNAP V-dependent region 
was fitted into smooth peaks using Gaussian distribution fit; a 
single curve representing the average small RNA abundance dif- 
ference and the width of the region of impacted small RNAs 
was fitted to the data at each cluster (Fig. 5A, red lines in the 
foreground). The fitted curves were positioned mainly in the 500 
bp central region with slight spread 100 bp upstream or down- 
stream from the center, indicating that the /^r^d'-directed sup- 
pression is concentrated within each 500 bp regions we selected. 
Using the height and width of each fitted curve, an average curve 
was calculated (Fig. 5B) to represent the average width and 
RNAP V-dependent small RNA abundance for these 100 clus- 
ters. The average width of a RNAP V-dependent locus was ^238 
bp (Fig. 5B), the size of a typical RNAP V-dependent locus in 
the genome, possibly representing the length of a region in which 
RNAP V is active. 

SINE and RC/Helitron overrepresentation in other RdDM 
mutants. RdDM effectors such as DMS4, DRDl, DMS3 and 



RDMl play a role in scaffold RNA production and de novo 
methylation and maintenance, possibly by promoting RNAP V 
transcription and activation. Because of the functional relation- 
ship of these proteins in RdDM, we were interested in the small 
RNA profiles of mutants of these genes. These mutants have pre- 
viously been examined only in selected genomic loci.^^'^*^'^^ We 
generated small RNA libraries from mutant alleles dms4-l, drdl- 
1, dms3-l and rdml-4, which have well-characterized epigenetic 
phenotypes, such as reduced 24 nt siRNA and cytosine methyla- 
tion. ^^'^^'^^'^^ We also generated another control library ["Wt (T + 
S)" for wild-type plus the target and silencer transgene construct] 
for these RdDM mutants, which were described in a previ- 
ous mutant screen that assayed the release of transgene silenc- 
ing.^° From the deep sequencing results, the abundance of each 
small RNA was normalized to reads per five million (RP5M) 
(Table SI). We obtained 5 to 16 million total genome-matched 
small RNA sequences, corresponding to 1.3 to 2.7 million dis- 
tinct sequences. The small RNA complexity of the RdDM mutant 
libraries ranged from 0.151 to 0.269, which were at similar levels 
to the 0.264 of the Wt (T + S) control library. The size distribu- 
tion profiles of the RdDM mutant compared with the control 
showed the reduction of 24 nt abundances in drdl-1, dms3-l and 
rdml-4 mutant libraries but not in the dms4-l library (Fig. 6A 
and B; Table S5). Interestingly, dms4-l showed a slightly lower 
ratio of 24:21 nt abundance to the control library, indicating that 
the reduction of 24 nt siRNA abundance was minimal in the 
absence of DMS4 (Fig. 6A and B). However, the reduction in 
the percentage of 24 nt abundance was more than 3 -fold greater 
in drdl-U dms3-l and rdml-4 libraries compared with their con- 
trol, indicating a significant impact on heterochromatic siRNA 
accumulation in these three components of the DDR complex 
as in the nrpe mutant. The greater impact of drdl-1, dms3-l and 
rdml-4 on 24 nt abundance may be explained by the fact that two 
to four times more of the high abundant small RNA sequences 
were greatly reduced in drdl-1, dms3-l and rdml-4 librar- 
ies than in the dms4-l library, especially for 24 nt small RNAs 
(Table S5A) . As a result, there is a greater percentage of the high- 
abundance 24 nt small RNA impacted in drdl-1, dms3-l and 
rdml-4 libraries than in the dms4-l library (Table S5B). 

Next, we characterized the small RNA-generating regions 
that were dependent on the set of four RdDM effectors, DMS4, 
DRDl, DMS3 and RDMl. We first calculated the number of 
small RNA-generating clusters in four individual RdDM mutant 
libraries (sum of HNA control + mutant greater than 100 RP5M) 
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Figure 5. Assessment of the size of RNAP V-dependent regions. (A) Plot showing the degree of nrpe-dependent suppression on small RNA abundance 
across the width of RNAP V-dependent regions. One hundred of 500 bp, highly RNAP V-dependent clusters are selected (the average sum of HNA in 
nrpe libraries is less than 5 and HNA in Col is greater than 250 RP5M). Each selected cluster is joined together with its 500 bp upstream and 500 bp 
downstream region for a 1,500 bp region. The degree of nrpe-dependent small RNA suppression is presented as the rounded fold-difference between 
Col vs. nrpe libraries in a window of 100 bp, sliding every 20 bp across the entire 1,500 bp region (gray background lines). For each of the RNAP V-de- 
pendent region, Gaussian distribution is used to fit the gray line of fold difference and shown as the red curve in the foreground. (B) The average curve 
was calculated (in red) from the data in (A) to represent the average size of a RNAP V-dependent region. The average width of a RNAP V-dependent 
locus was computed from the individual height and widths of each fitted curve in (A), independent of the exact position within the cluster (positions 
were ignored to focus on the shape of the curve for each cluster). The width of the average curve was calculated as 238 bp. 



(Fig. 6D, criteria A), v^hich identified approximately 14,000 to 
18,000 clusters compared v^ith the 11,667 small RNA-generating 
clusters in nrpe mutants. Next, we calculated the fold difference 
of HNA of small RNA clusters betw^een control and each of four 
RdDM mutants, and plotted the numbers of clusters against the 
fold difference to see the impact of each mutation on small RNA 
abundance. As v^e observed for the nrpe libraries, the majority 
of the small RNA-generating loci v^ere not impacted in the four 
RdDM mutants, with more than 50% of the clusters shov^ing a 
less than 2-fold difference relative to the control (Fig. 6C). On 
the other hand, a significant proportion of clusters v^ere dov^n- 
regulated in the four RdDM mutants, v^ith 8% to 16% of the 
total shov^ing at least a 10 -fold higher small RNA abundances 
in the control compared with the four RdDM mutant libraries 
(Fig. 6C and D, criteria B). This result suggested that, although 
there were more small RNA generating loci in the four RdDM 
mutant libraries, the number of mutant-dependent loci was 
smaller in the four RdDM mutants than in nrpe, i.e., the degree 
of mutant-dependent suppression on small RNA abundance was 
less severe in the dms4-l, drdl-1, dms3-l and rdml-4 mutants. 

We were interested to determine whether these small RNA 
loci suppressed in the four RdDM mutants were also suppressed 
in the nrpe mutant, which would indicate the degree of over- 
lapping dependency of RNAP V and other RdDM effectors on 
small RNA production. Therefore, pairwise comparisons were 



done for small RNA clusters that were both highly suppressed in 
nrpe and in one of the four RdDM mutants (Fig. 6D, criteria C). 
For the drdl-1 mutant, 1,067 out of 1,871 (57%) /^/r/^/7-suppressed 
clusters were also suppressed in nrpe. Overall, we found around 
60% of dms3-, rdml- or /^/m^^-suppressed clusters were also sup- 
pressed in nrpe, a high degree of overlap. There were 860 small 
RNA-generating clusters that were suppressed in nrpe, dms4-l 
and dms3-l (Fig. 6E), and 1,348 clusters suppressed in drdl-1, 
dms3-l and rdml-4 (Fig. 6F). Therefore, our data showed that 
dms3, rdml and drdl were impacted predominantly at the same 
small RNA-generating regions, and these regions were in fact a 
subset of /^/m^^-suppressed regions. This is consistent with their 
function together in the DDR complex.^^ 

Finally, we categorized these RdDM mutant-suppressed 
small RNA clusters according to their association with genomic 
repeats. We observed a similar trend of repeat types suppressed 
in these four RdDM mutant libraries as in nrpe\ SINEs and 
RC/Helitrons were overrepresented in RNAP V-dependent and 
RdDM effector-dependent loci (Fig. 6G). Conversely, LTR/ 
Copia- and LTR/Gypsy-associated clusters were underrepre- 
sented in RNAP V-dependent and RdDM effector-dependent 
clusters (Fig. 6G). Therefore, our results showed that small 
RNA production was impacted in shorter repeats not only in 
nrpe mutant but also in RdDM mutants including dms4, drdl, 
dms3 and rdml. 
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Discussion 

We characterized the RNAP V-dependent, small RNA-generating 
loci in Arabidopsis by deep sequencing of small RNAs in wild- 
type and a null RNAP V mutant allele nrpe. The loss-of-function 
mutation in the NRPEl gene caused a substantial reduction in 
24 nt small RNA accumulation in about 18% of small RNA- 
generating loci. RNAP V seems to affect small RNA production 
mainly from intergenic regions in euchromatin: 73% of RNAP 
V-dependent loci are in intergenic regions and 27% are in gene 
coding regions. We also examined the genome distribution of 
RNAP V-dependent loci. Pericentromeric regions are dense 
regions of repetitive sequences and thus heavily methylated and 
rich in heterochromatin.^^'^^ We show that RNAP V- and RNAP 

IV- dependent loci both exhibited a less-pericentromeric concen- 
tration especially in comparison to RDR2-dependent loci, and 
RNAP V-and RNAP IV-dependent loci appeared to be more dis- 
persed across the genome rather than enriched at heterochroma- 
tin-dense regions near centromeres. The dispersed distribution of 
both RNAP V- and RNAP IV-dependent loci is reminiscent of 
the DRMl/2 target regions along the chromosome, which is in 
contrast to the CMT3- and KYP-target loci, which reside mostly 
in the centromere-proximal 2 Mb.^^ While non-CG methylation 
is generally maintained by CMT3 in pericentric heterochroma- 
tin, non-CG methylation is maintained by DRMl/2 DNA meth- 
yltransferases particularly in small, silenced regions that span 
only few nucleosomes.^'^^ Collectively, our results suggested that 
RNAP V, along with RNAP IV and DRMl/2, may work via 
RdDM to target a subset of small transposable elements located 
in intergenic regions dispersed across the genome for silencing, 
which are likely maintained by non-CG methylation. 

Apart from their similarity in chromosomal position, and 
as-yet uncharacterized features such as epigenetic marks, the 
differences between RNAP V- and RNAP IV-dependent loci 
apparently include the repeat type. Both RNAP IV- and RNAP 

V- dependent loci were shown to be associated with transposable 
repeat sequences, but the repeat associations differed between the 
two classes of loci (Fig. 3).^^'^^ LTR retrotransposons were over- 
represented in RNAP IV-dependent and RNAP V-independent 
loci, while non-LTR retrotransposons and helitrons were over- 
represented in RNAP IV-and RNAP V-dependent loci.^^'^^ Our 
results showed an overrepresentation of SINEs and RC/Helitrons 
and the underrepresentation of the ^psy class of retroelements at 
RNAP V-dependent loci. One plausible explanation for the dif- 
ferent repeat types associated with RNAP IV or RNAP V may 
come from our results showing an inverse correlation between 
repeat size and nrpe repression. RNAP IV-directed silencing is 
clearly important for the repression of long repetitive sequences 
such as LTR elements. ^^'^^ On the other hand, RNAP V-directed 
silencing appears to be focused on small repeats such as SINEs 
and RCHelitrons (Fig. 3). As proposed previously by Zilberman 
and Henikoff,^*^ shorter silenced regions of few nucleosomes in 
length might be difficult to maintain during replication due to 
the small number of epigenetic marks on a short piece of DNA 
and its associated histones, therefore requiring active siRNA 
targeting to reinforce silencing. In this case, RNAP IV may be 



required for siRNA production and RNA-directed silencing to 
suppress larger repetitive elements, whereas our data support the 
point that RNAP V may be essential for siRNA production in 
smaller and/or more euchromatic regions. Ahmed et al. showed 
that, among the repeat classes, more than 90% of gypsy ele- 
ments are densely methylated, SINEs are moderately methylated 
(75% methylated and 15% unmethylated), and RC/Helitrons 
are the least methylated sequences (40% methylated and 40% 
unmethylated). Their results were in agreement with the spe- 
cial patterns of repeat association of RNAP V-dependent small 
RNA-generating clusters that we observed. In other words, LTRs 
could be subjected to robust silencing by RNAP IV to render 
them heavily methylated in heterochromatic regions, while RC/ 
Helitrons and SINEs, targeted by RNAP V, may direct silencing 
at TEs found more frequently in euchromatin. 

This concept is reminiscent of a model of the reversible silenc- 
ing of euchromatic genes by RNAP V action.^^ Indeed, Huettel et 
al.^^ proposed that RNAP V, along with DRDl, establishes a basal 
methylation state of the targets such as euchromatic promoters or 
intergenic repeats in gene-rich regions, which can be reversed when 
methylation marks are changed in response to developmental cues 
or environmental stresses. On the other hand, sequences such as 
LTRs in repeat-rich regions are subjected to additional epigenetic 
modification and persistent suppression in order to maintain the 
stable silencing state of heterochromatin. Taken together, these 
results suggest that RNAP V-directed, siRNA-mediated silencing 
may provide a dynamic regulation of DNA methylation by target- 
ing repeats and/or enhancers in promoters or intergenic regions, 
which in turn regulate the expression of neighboring genes. 

Our results indicated that RNAP V-dependent regions are 
smaller repeat-associated regions in Arabidopsis genome with 
an average size of ^238 bp. The fact that nrpe mutation greatly 
impacted smaller repeat-associated siRNA-generating regions 
across all repeat classes implies a relevance of repeat size in RNAP 
V-targeted silencing. Size has been a factor in the study of the 
relationship between nucleosome positioning and DNA methyla- 
tion; nucleosome positioning strongly affects the patterns of DNA 
methylation throughout the genome and the 147 bp periodicity of 
methylation patterns matches the length of DNA wrapped around 
one nucleosome.^^ The estimated length of RNAP V-dependent 
regions could represent the length of more than one nucleosome, 
but more data are needed to link the size of RNAP V-dependent 
loci to nucleosome positioning and methylation periodicity. 
One related insight about the small size of RNAP V-dependent 
repeat-associated regions may come from the study of siRNA and 
methylated TEs.^^ Ahmed et al. showed that unmethylated and 
poorly methylated TE sequences are smaller than their methyl- 
ated counterparts, with the former having a medium size of less 
than 500 bp. Perhaps RNAP V-dependent regions are specifi- 
cally associated with smaller repeat sequences regardless of their 
repeat class (Fig. 4). It has been demonstrated that methylation 
is preferentially reduced in small TEs when RdDM components 
AG04 or DRMl/2 are defective, indicating that the RdDM 
pathway downstream of siRNA biogenesis is required to maintain 
the methylation state for small but not long TEs.^^ Therefore, it 
is possible that one of the key functions of RNAP V targeting 
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Figure 6 (See opposite page). RdDM mutants show similar trends to the nrpe mutant of suppression of SINE-annotated siRNA-generating clusters. 
(A) Reduction of 24 nt small RNAs in RdDM mutant libraries. Shown for control ["Wt (T + S)"] and RdDM mutant libraries {drdl-l, drmS-l, rdml-4 and 
dms4-l) are the 24 nt to 21 nt ratios of genome-matched small RNAs abundance (excluding structural RNAs). (B) Small RNA size profiles in control and 
RdDM mutant libraries. For each size class of small RNAs, the percentage of the small RNA abundance (excluding structural RNA) to the abundance of 
total genome-matched reads was calculated and normalized to the abundance of the 21 nt of wild-type libraries. (C) Number of clusters impacted in 
the RdDM mutants by the fold differences of small RNA abundance between control and RdDM mutant libraries. For each cluster, the fold difference 
of hit-normalized small RNA abundance between the control and RdDM libraries are calculated and rounded. Total numbers of clusters were tallied 
by the fold differences and plotted as data in Y-axis for each RdDM mutant library. These data and in particular the category of > 10 is analogous to 
Figure 1C. (D) Pairwise comparison of RNAP V and RdDM effector-dependent. Numbers of clusters were calculated based on different criteria in nrpe, 
drdl-l, dms3-l, rdml-4, dms4-l libraries. Number of small RNA-generating clusters (criteria A) and RdDM effector-dependent, small RNA-generating 
clusters (criteria B) are shown for each RdDM mutant library. Pairwise comparison of nrpe vs. RdDM libraries (criteria C) shows the number of clusters 
which small RNA abundance is both greatly reduced in nrpe and RdDM mutant such as drdl-l, i.e., RNAP V- and DRDI-dependent small RNA-generat- 
ing clusters. (E and F) Area-proportional Venn diagrams show the number of small RNA-generating clusters which are suppressed in nrpe, dms4-l and 
dms3-l (E) and drdl-l, dms3-l and rdml-4 (F, left). The number of each sector in the Venn diagram in (F) is also shown in a representative Venn diagram 
(F, right). (G) Classes of repetitive sequences represented in RdDM effector-dependent small RNA-generating clusters. RNAP V-dependent (RNAP 
V-dpt) and RNAP V-independent (RNAP V-indpt) clusters are defined as described in the Figure 1 legend. RdDM effector-dependent clusters were 
selected with criteria B as described above. Repetitive sequences were identified by the TAIR9 repeat annotation. Proportion of repeat-annotated 
clusters to total number of RdDM-dependent loci for each control-RdDM mutant pair is shown on the y-axis (percentage). 



in RdDM is to silence small TEs in euchromatin which are dis- 
tributed too broadly and/or too small to be stably integrated 
into heterochromatin.^^'^^'^*^ In the future, methylation and his- 
tone profiles of RNAP V and related mutants will allow a more 
detailed understanding of the function of RNAP V-dependent 
siRNA-directed silencing. 

The current view is that DMS4, DRDl, DMS3 and RDMl 
are RdDM effectors that are involved in facilitation of RNAP V 
transcription as RNA scaffold and recruitment of silencing com- 
plex to target genomic sites. ^^'^^ Indeed, for the four RdDM effec- 
tors we examined, the degree of mutant-dependent suppression 
on small RNA abundance was less severe in the dms4-l, drdl- 
1, dms3-l and rdml-4 mutants compared with that of the nrpe 
mutant, consistent with their roles predicted to be downstream of 
siRNA biogenesis. Most importantly, dms3, rdml, drdl and dms4 
were impacted predominantly at the same type of small RNA- 
generating regions as in nrpe^ especially on small repeats of SINE 
and RC/Helitrons but less impacted at long repeats like LTR/ 
Gypsy (Fig. 6). Our results indicated that these RdDM effec- 
tors may affect siRNA abundance mainly through their function 
together with RNAP V at certain RdDM target loci. However, 
a portion of small RNA-generating clusters in dms3, rdml, drdl 
and dms4 did not overlap with RNAP V-dependent clusters, sug- 
gesting a partially non-redundant role for these RdDM effectors 
on siRNA accumulation with RNAP V One possibility is that 
they work together with other RNA polymerases in RdDM, for 
example, like DMS4 interacts with both RNAP II and RNAP V 
and possibly regulate their abundance and/or polymerase activ- 

27,58 'pj^g relationship between RdDM effector-dependent small 
RNA levels at specific genomic loci and epigenetic marks such as 
DNA methylation and histone modifications has yet to be fully 
elucidated. Future studies on these RdDM effectors will provide 
better understanding of their functions in small RNA-directed 
silencing. 

Materials and Methods 

Mutants and plant growth conditions. The mutant allele of 
nrpdlh-11 (SALK_029919), nrpdla-4 (Salk_083051) and rdr2-l 



(SAIL_1277_H08) used in this study was from Arabidopsis thali- 
ana ecotype Columbia as described previously in references 7 and 
16. dms4-l, dms3-U rdml-4 and drdl-l were described previously 
in references 22, 27, 30 and 40. Plants were grown in a growth 
chamber with 16 h of light for five weeks. Immature inflores- 
cence tissues including inflorescence meristem and early stages 
floral buds (up to stage 11/12) were collected. Total RNA was 
isolated using Tri-reagent (Molecular Research Center) accord- 
ing to the manufacturer's instructions. 

Small RNA data generation and analysis. Small RNA 
libraries were constructed as described in reference 59, and 
the SBS sequencing was performed on an lUumina GAIIx at 
the University of Delaware. The SBS data was processed and 
normalized as previously described in references 60 and 61. In 
brief, the raw sequencing data was converted to SCARF for- 
mat, trimmed of adapters, matched to the genome (Arabidopsis 
TAIR version 9, "TAIR9"), and read counts normalized based 
on the total abundance of genome-matched small RNA reads, 
excluding structural sRNAs originating from annotated tRNA, 
rRNA, small nuclear (sn) and small nucleolar (sno) RNAs.*^^ 
The "hits-normalized-abundance" (HNA) values were cal- 
culated by dividing the normalized abundance (in RP5M) 
for each small RNA hit, where a hit is defined as simply the 
number of loci at which a given sequence perfectly matches the 
genome. Repeat annotation was based on the repeat informa- 
tion in the TAIR9 genome.^^ These sequence data are avail- 
able in GenBank's GEO database under the accession number 
GSE36424. 

Clustering and differential expression analysis was performed 
using custom Perl and database scripts. '^^ A static clustering 
approach was implemented to calculate the hits-normalized 
abundance of all the small RNA heads in every 500 bp cluster, 
as described in the main text. Repeats were annotated for clusters 
only if more than 100 nt of the 500 nt clusters was marked as a 
known repeat. 

To calculate the width of RNAP V dependent regions, we 
selected regions of total 1,500 bp; 500 bp was added both 5' and 
3' to the selected 500 bp cluster, as described in the main text. 
The points in the peak represent the fold-difference between Col 
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and nrpe (Y value) within a window of 100 bp, recalculated every 
20 bp across the 1,500 bp region. The peaks were smoothed to fit 
a Gaussian distribution to the data calculated from the moving 
average. To generate these graphs, a static sliding-window cluster- 
ing method was implemented to re-compute these values within 
smaller bins of 100 bp, sliding every 20 bp across the selected 
1,500 bp regions. The analysis was done using custom Perl scripts 
and MySQL database queries, while the graphs and Gaussian dis- 
tribution fits to individual peaks were generated using OriginLab 
software. 
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